1,179 research outputs found

    Challenges for Chemoinformatics Education in Drug Discovery

    Get PDF
    Surveys the curriculum developed at Indiana University for teaching cheminformatics in the IU School of Informatic

    Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures

    Get PDF
    Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture (DPM) models provide a nonparametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model-based clustering methods have been to short time series data. In this paper, we present a case study of the application of nonparametric Bayesian clustering methods to the clustering of high-dimensional nontime series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a DPM model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data

    Improving integrative searching of systems chemical biology data using semantic annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Systems chemical biology and chemogenomics are considered critical, integrative disciplines in modern biomedical research, but require data mining of large, integrated, heterogeneous datasets from chemistry and biology. We previously developed an RDF-based resource called Chem2Bio2RDF that enabled querying of such data using the SPARQL query language. Whilst this work has proved useful in its own right as one of the first major resources in these disciplines, its utility could be greatly improved by the application of an ontology for annotation of the nodes and edges in the RDF graph, enabling a much richer range of semantic queries to be issued.</p> <p>Results</p> <p>We developed a generalized chemogenomics and systems chemical biology OWL ontology called Chem2Bio2OWL that describes the semantics of chemical compounds, drugs, protein targets, pathways, genes, diseases and side-effects, and the relationships between them. The ontology also includes data provenance. We used it to annotate our Chem2Bio2RDF dataset, making it a rich semantic resource. Through a series of scientific case studies we demonstrate how this (i) simplifies the process of building SPARQL queries, (ii) enables useful new kinds of queries on the data and (iii) makes possible intelligent reasoning and semantic graph mining in chemogenomics and systems chemical biology.</p> <p>Availability</p> <p>Chem2Bio2OWL is available at <url>http://chem2bio2rdf.org/owl</url>. The document is available at <url>http://chem2bio2owl.wikispaces.com</url>.</p

    Variable Selection and Model Averaging in Semiparametric Overdispersed Generalized Linear Models

    Full text link
    We express the mean and variance terms in a double exponential regression model as additive functions of the predictors and use Bayesian variable selection to determine which predictors enter the model, and whether they enter linearly or flexibly. When the variance term is null we obtain a generalized additive model, which becomes a generalized linear model if the predictors enter the mean linearly. The model is estimated using Markov chain Monte Carlo simulation and the methodology is illustrated using real and simulated data sets.Comment: 8 graphs 35 page

    From Starburst to Quiescence: Testing AGN feedback in Rapidly Quenching Post-Starburst Galaxies

    Get PDF
    Post-starbursts are galaxies in transition from the blue cloud to the red sequence. Although they are rare today, integrated over time they may be an important pathway to the red sequence. This work uses SDSS, GALEX, and WISE observations to identify the evolutionary sequence from starbursts to fully quenched post-starbursts in the narrow mass range logM(M)=10.310.7\log M(M_\odot) = 10.3-10.7, and identifies "transiting" post-starbursts which are intermediate between these two populations. In this mass range, 0.3%\sim 0.3\% of galaxies are starbursts, 0.1%\sim 0.1\% are quenched post-starbursts, and 0.5%\sim 0.5\% are the transiting types in between. The transiting post-starbursts have stellar properties that are predicted for fast-quenching starbursts and morphological characteristics that are already typical of early-type galaxies. The AGN fraction, as estimated from optical line ratios, of these post-starbursts is about 3 times higher (36±8%\gtrsim 36 \pm 8 \%) than that of normal star-forming galaxies of the same mass, but there is a significant delay between the starburst phase and the peak of nuclear optical AGN activity (median age difference of 200±100\gtrsim 200 \pm 100 Myr), in agreement with previous studies. The time delay is inferred by comparing the broad-band near NUV-to-optical photometry with stellar population synthesis models. We also find that starbursts and post-starbursts are significantly more dust-obscured than normal star-forming galaxies in the same mass range. About 20%20\% of the starbursts and 15%15\% of the transiting post-starbursts can be classified as the "Dust-Obscured Galaxies" (DOGs), while only 0.8%0.8\% of normal galaxies are DOGs.The time delay between the starburst phase and AGN activity suggests that AGN do not play a primary role in the original quenching of starbursts but may be responsible for quenching later low-level star formation during the post-starburst phase.Comment: 30 pages, 18 figures,accepted to Ap

    PubChemSR: A search and retrieval tool for PubChem

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent years have seen an explosion in the amount of publicly available chemical and related biological information. A significant step has been the emergence of PubChem, which contains property information for millions of chemical structures, and acts as a repository of compounds and bioassay screening data for the NIH Roadmap. There is a strong need for tools designed for scientists that permit easy download and use of these data. We present one such tool, PubChemSR.</p> <p>Implementation</p> <p>PubChemSR (Search and Retrieve) is a freely available desktop application written for Windows using Microsoft <it>.NET </it>that is designed to assist scientists in search, retrieval and organization of chemical and biological data from the PubChem database. It employs SOAP web services made available by NCBI for extraction of information from PubChem.</p> <p>Results and Discussion</p> <p>The program supports a wide range of searching techniques, including queries based on assay or compound keywords and chemical substructures. Results can be examined individually or downloaded and exported in batch for use in other programs such as Microsoft Excel. We believe that PubChemSR makes it straightforward for researchers to utilize the chemical, biological and screening data available in PubChem. We present several examples of how it can be used.</p

    The identification of post-starburst galaxies at z∼1 using multiwavelength photometry: a spectroscopic verification

    Get PDF
    Despite decades of study, we still do not fully understand why some massive galaxies abruptly switch off their star formation in the early Universe, and what causes their rapid transition to the red sequence. Post-starburst galaxies provide a rare opportunity to study this transition phase, but few have currently been spectroscopically identified at high redshift (z > 1). In this paper, we present the spectroscopic verification of a new photometric technique to identify post-starbursts in high-redshift surveys. The method classifies the broad-band optical–nearinfrared spectral energy distributions (SEDs) of galaxies using three spectral shape parameters (supercolours), derived from a principal component analysis of model SEDs. When applied to the multiwavelength photometric data in the UKIDSS Ultra Deep Survey, this technique identified over 900 candidate post-starbursts at redshifts 0.5 5 angstrem) and Balmer break, characteristic of post-starburst galaxies.We conclude that photometric methods can be used to select large samples of recently-quenched galaxies in the distant Universe

    Far Infrared and Submillimeter Emission from Galactic and Extragalactic Photo-Dissociation Regions

    Get PDF
    Photodissociation Region (PDR) models are computed over a wide range of physical conditions, from those appropriate to giant molecular clouds illuminated by the interstellar radiation field to the conditions experienced by circumstellar disks very close to hot massive stars. These models use the most up-to-date values of atomic and molecular data, the most current chemical rate coefficients, and the newest grain photoelectric heating rates which include treatments of small grains and large molecules. In addition, we examine the effects of metallicity and cloud extinction on the predicted line intensities. Results are presented for PDR models with densities over the range n=10^1-10^7 cm^-3 and for incident far-ultraviolet radiation fields over the range G_0=10^-0.5-10^6.5, for metallicities Z=1 and 0.1 times the local Galactic value, and for a range of PDR cloud sizes. We present line strength and/or line ratio plots for a variety of useful PDR diagnostics: [C II] 158 micron, [O I] 63 and 145 micron, [C I] 370 and 609 micron, CO J=1-0, J=2-1, J=3-2, J=6-5 and J=15-14, as well as the strength of the far-infrared continuum. These plots will be useful for the interpretation of Galactic and extragalactic far infrared and submillimeter spectra observable with ISO, SOFIA, SWAS, FIRST and other orbital and suborbital platforms. As examples, we apply our results to ISO and ground based observations of M82, NGC 278, and the Large Magellenic Cloud.Comment: 54 pages, 20 figures, accepted for publication in The Astrophysical Journa
    corecore